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In  filtering  and  smoothing,  information  carrying  data  are  extracted  from  noisy 
observations.  The  formalization  and  solution  of  the  filtering  and  smoothing  problems  are 
well  established,  when  the  joint  process  that  characterizes  the  relationship  between 
information  and  noise  data  sequences  is  statistically  well  known  (see  Kalman  (1960, 
1963),  Kolmogorov  (1941),  and  Wiener  (1949)),  or  parametrically  known.  Linear  filtering 
and  smoothing  operations  are  then  by  far  the  most  widely  used,  due  to  their  simplicity  in 
implementation.  In  practice,  however,  the  occurrence  of  occasional  extremely  erroneous 
data  values,  called  outliers,  are  frequently  observed.  Furthermore,  linear  data  operations 
are  notoriously  nonresistant  to  such  outliers,  inducing  dramatic  performance  instabilities. 
The  purpose  of  this  paper  is  to  establish  a  theory  for  outlier  resistant  filtering  and 
smoothing  procedures,  and  to  provide  specific  such  data  operations  for  Gaussian 
information  processes,  and  additive,  nominally  Gaussian,  noise  processes.  The  initial 
steps  of  our  presentation  are  based  on  the  theory  of  qualitative  robustness  (see  Boente  et 
al  (1982),  Cox  (1978),  Hampel  (1971),  Papantoni-Kazakos  and  Gray  (1979),  and 
Papantoni-Kazakos  (1981,  1987,  1984a,  1984b)).  Our  approaches  on  pertinent 
performance  criteria  are  as  those  in  Hampel  et  al  (1986). 

Problems  of  nonlinear  filtering  are  considered  in  the  paper  by  Masreliez  and  Martin 
(1977).  In  particular,  the  above  authors  present  a  robustification  procedure  for  Kalman 
filters  operating  on  the  outputs  of  linear  dynamical  systems.  Discussion  of  their  results 
and  comparisons  with  ours  are  given  in  Section  4  and  6  of  this  paper. 

A  general  theory  and  methodology  for  nonlinear  smoothers,  acting  on  stationary' 
processes,  is  developed  by  Mallows  (1980).  The  issue  of  primary  concern  there  is  the 


K 
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decomposition  of  smoothers  into  linear  and  nonlinear  parts  and  the  study  of  their 
properties.  Furthermore,  the  problem  of  outlier  resistance  is  examined,  using  as  indicator 
of  resistance  an  extension  of  Hampel’s  concept  of  the  breakdown  point.  However, 
explicit  design  issues  are  not  undertaken.  Relevant  results  on  the  design  and  analysis  of 
specific  outlier  resistant  filters  and  smoothers  for  stationary  processes  can  be  found  in 
Tsaknakis  (1986). 

In  section  2  of  this  paper,  we  first  present  a  formalization  of  the  filtering  and 
smoothing  problem  under  consideration.  Then,  we  define  outlier  resistance  for  filtering 
and  smoothing  operations  and  present  certain  sufficient  conditions  for  resistance  of  such 
operations.  In  section  3,  a  two  person  game  formalization  is  adopted  for  fixed  finite 
length  operations  and  the  corresponding  least  favorable  structure  is  derived.  In  Section 
4,  the  above  structure  is  used  for  the  design  of  a  causal  recursive  filtering  operation  when 
the  nominal  information  process  is  autoregressive  and  the  nominal  noise  process  is  i.i.d. 
Then  the  asymptotic  properties  of  the  resulting  operation  are  studied  on  a  stationary 
environment,  in  terms  of  asymptotic  outlier  resistance,  asymptotic  stationarity  and 
asymptotic  mean  square  error  at  the  nominal  model. 

In  section  5,  we  define  the  breakdown  point  and  the  influence  function  of  a  filtering 
or  smoothing  operation.  Both  these  quantities  are  defined  in  such  a  way  as  to  reflect 
important  sensitivity  aspects  of  the  mean  square  error,  induced  by  the  filtering  or 
smoothing  operation,  to  the  action  of  outliers.  Then,  we  continue  with  the  explicit 
evaluation  and  study  of  the  breakdown  point  and  influence  function  of  the  filter  presented 
in  section  4.  Section  6  is  devoted  to  the  numerical  evaluation  and  comparison  of  the 
proposed  filter  in  relationship  to  an  existing  one,  for  specific  numerical  examples. 
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Finally,  in  section  7,  we  briefly  present  some  conclusions. 

2.  Preliminaries 

We  consider  real-valued  discrete-time  information  and  noise  stochastic  processes, 
denoted  respectively  by  {Xn,  neZ),  (W^,  neZ),  where  Z  is  the  set  of  integers.  The 
observation  process,  { Y  ,  neZ}  is  given  by  the  equation. 

Y  =  X  +  W  ,  neZ  (1) 

It  will  be  assumed  that  the  information  and  noise  processes  are  independent.  Then  a 
complete  statistical  description  of  the  model  (1)  is  provided  by  the  probability  measure*of 
{XnJ,  {Wn},  denoted  by  My  respectively.  The  probability  measures  of  the 
observation  process  {Yn),  denoted  by  my>  is  expressed  as  the  convolution  My  =  MS*MN 
and  the  joint  probability  measure  of  {Yn,  Xn),  denoted  by  M>  is  expressed  as  the  product 
M({  Y  ,  XJ)  =  Ms({Xn})MN({Yn  -  XJ).  Assuming  that  (Xn)  has  finite  second  order 
moments,  let  us  consider  the  minimum  mean  square  estimation  of  the  information 
process  value  XQ  given  a  finite  length  /  observation  sequence  {y.,  y.^^},  denoted  as  y 
for  short.  If  i+/-l<0,  we  refer  to  causal  filtering  or  simply  filtering.  If  i+/-l>0  we  refer 
to  noncausal  filtering  or  smoothing.  Given  the  measure  M>  ^  minimum  mean  square 
estimator,  XQ,  of  XQ  is  the  conditional  expectation 

X0(y/)  =  E{X(/y/,M}  (2) 

which  is  a  function  of  the  sequence  y  whose  specific  form  is  determined  by  M-  If  M  *s 

i  i 

Gaussian,  X()(y  )  is  an  affine  transformation  of  y  .  The  induced  by  X()  mean  square  error 
is  denoted  by  e(M,X0),  and  is  a  functional  of  M  and  X()  given  by  the  expression 


(3) 


c(^,X0)  =  E{[X0-X0(Y,)]2/h} 

The  occurrence  of  occasional  erroneous  values  in  the  noise  process  (WJ,  called 
outliers,  induces  uncertainties  in  the  description  of  the  measure  |i^.  That  induces  further 
uncertainties  in  the  measures  ji  and  fi  The  initial  issue  here  is  the  qualitative 
characterization  of  those  uncertainties.  A  particularly  useful  tool  for  describing 
uncertainties  of  probability  measures  is  the  Prohorov  distance  (see  Hampel  (1971),  Boente 
et  al  (1982),  Papantoni-Kazakos  (1987, 1984b)),  whose  definition  is  given  below. 

Let  p(v)  be  a  metric  in  R  ,  and  let  v  ^  ,v  ^  be  probability  measures  defined  on  the 
Borel  field  of  (R  ,p).  Let  N  be  the  class  of  all  joint  measures,  v ,  whose  marginals  are  v 
and  vr  The  Prohorov  distance  n  (v  j,v2)  is  defined  as  follows 

np(v  rv2)  =  inf  inf { 5>0  :  v(a,(3  :  p(a,P)>5)<5)  ^ 

veN 

where  a,p  denote  elements  of  Rn. 

The  selection  of  the  metric  p(-,*)  reflects  the  pattern  according  to  which  the  outliers 
corrupt  the  nominal  process.  For  the  purpose  of  this  paper,  we  select  a  metric  which 
corresponds  to  outliers  occurring  in  batches,  or  bursts  of  size  m,  m  being  a  fixed  design 
parameter.  Such  a  metric  is  defined  as  follows:  (see  Papantoni-Kazakos  (1984a,  1984b)). 


n  —  —  ^  ji 

For  a,peR  ,  let  a,  P  be  sequences  generated  by  repetitions  of  a  and  p.  Also,  let  a 


denote  (a ,  •  •  ■  ,  a),  j<k.  Then,  we  define  the  metric  p  (•,•)  in  R  as 

j  k  n,m 


„  „  -1  _  i+m-l  _  i+m-1 

Pn.m(a’P)  =  mfl5>0:n  I#  '  :  yja,  '  P,  )  >  5]  <  5) 

i=l,...n 


where,  the  auxiliary  metric  y  (-,-)  is  defined  as 


4 


m 


Ym«x\|3')  =  m~  £la'.H3'.l,a',P'eRn 


In  the  sequel,  we  use  the  Prohorov  distance  in  (4)  with  the  metric  (5)  to  give  a 

formal  definition  of  outlier  resistant  estimators.  Let  u.  u  „  denote  the  nominal 

roN  roY 

measures  of  the  noise  and  observation  processes  respectively,  and  ^  denote  the  nominal 
joint  measure  of  the  observation  and  information  processes.  Also,  let  p  =  p  be  the 
fixed  information  process  measure. 

An  estimator  X0(y  )  of  XQ  from  the  observation  sequence  y  is  called  outlier 
resistant  or  qualitatively  robust  at  (IqN  if 


-Vr|>0,  there  is  an  e>0  such  that 

np<vn(PoN>  <  e  implies  I  e(P0,X0)^(p,X0)  I  <r\ 

for  every  n. 

Notice  that  po  and  (i  are  fully  determined  from  poN  and  jaN_ 

Considering  stationary  and  ergodic  processes,  the  limit  /  im  LI  (noN  |i  )  is  equal 

P  n,m  ^  * 

n — >oo 

to  the  Prohorov  distance  n  (HoN,lO.  Since  the  Prohorov  distance  11  (•,•)  metrizes  the 

weak  topology  of  the  probability  measure  on  (Rm,  ym),  an  estimator  XQ(y  )  of  length  /<: m 

is  resistant,  if  it  is  pointwise  continuous  and  bounded.  Such  estimators  are  constructed  in 

section  3.  However,  for  />m,  these  conditions  are  no  longer  sufficient;  appropriate 

resistant  estimators  of  asymptotically  large  length  tire  constructed  in  section  4. 

Consider  now  the  m-dimensiona!  restriction  of  the  nominal  measure  u  and  let  it 

*oN 

be  denoted  by  (i)N-  Furthermore,  assume  that  4™  is  absolutely  continuous  with  density 


f  ...  Then,  the  e-contaminated  class  of  densities 
oN 

m  m  m  m  m 

rN  (e)  =  {fN  =  (l-e)foN  +  eh  ,  h  arbitrary  m-dimensional  density}  (7) 

is  contained  in  the  class  FT  (jj.  )  <  e  of  measures  |iN,  for  any  e,  0<E<1.  The  constant 
8  can  be  interpreted  as  frequency  of  outlier  occurrence. 

m 

The  class  F  N  (8)  of  noise  densities  induces  the  following  class  of  joint  tri¬ 
dimensional  densities  of  the  observation  and  information  processes. 


_my  m  m  mm  _  x  „rn  mv  m  m  m  m  m  m  m  m 

F  (£)={/  :f  (y  -x  )  =  (l-e)fos(x  )foN(y  -x  )  +  efos(x  )h  (y  -x  )  (8) 


h  arbitrary } 


Class  fm(e )  contains  all  the  necessary  statistical  information  for  constructing  estimators 
of  length  at  most  m  and  will  be  used  in  the  forthcoming  section  as  the  model  for 
statistical  contamination. 


3.  Construction  of  Filtering  and  Smoothing  Operations  -  Step  1 

In  this  section  we  derive  a  finite  length  robust  estimator  of  the  information  process 
given  observation  sequence  of  length  /  <m,  where  m  corresponds  to  outlier  patterns,  as 
discussed  in  the  previous  section.  The  derivation  is  based  on  a  two-person  game 
formulation  of  the  estimation  problem,  with  payoff  function  the  induced  mean  square 
error.  To  fix  ideas,  suppose  that  XQ  is  to  be  estimated  from  a  length  /  observation 
sequence  y."  *,  denoted  as  y  for  short  (assume  i<0<i+/-l).  The  joint  density  of  x()  and 

y‘ ,  denoted  by  f(x0,yS.  belongs  to  an  e-contaminated  class  obtained  from  the  appropriate 

m 

restriction  of  the  more  general  class  F  (e),  as  defined  in  (8).  We  assume  that  the 


information  process  is  a  fixed  zero  mean  Gaussian  process  and  that  the  nominal  noise 


e 


process  is  also  zero-mean  Gaussian.  Given  an  estimator  XQ(y  )  and  a  joint  density 

I  ~  12 

f(x0,y  ),  the  mean  square  error  e(f,XQ)  of  XQ  at  /  is  the  expectation  E{[XQ-X0(Y  )]  /fj. 

* 

The  objective  is  to  find  a  density-estimator  pair  (f*,XQ)  that  constitutes  a  saddle  point. 


e(f,X0)<e(f*,X0)<e(f*,X0) 
for  every  Xn  measurable  and  feFm(e) 


Unfortunately,  a  saddle  point  solution  of  the  above  game  for  the  class  F  (e)  does 

not  exist.  In  particular,  the  quantity  inf  sup  e(f,XQ)  is  strictly  larger  than 

X0  f eF%) 

~  m 

sup  inf  e(f,XQ)  and  the  latter  supremum  with  respect  to  /  cannot  be  attained  in  F  (£). 
f eFm(e)  x0 

This  is  due  to  the  non-tightness  of  Fm(e)  which  allows  probability  masses  to  escape  to 
infinity.  For  this  reason  we  consider  an  enlargement  of  the  class  Fm(e)  to  include  all 
densities  of  the  form  (we  denote  the  enlarged  class  by  the  same  symbol): 


m  ,  m  m,  m  m  „  „  m  in  „  m  mm  „  m  m  m  m 

F  (£)={/  :/  (y  ,X  )  =  (l-e)/os  (x  )/oN  (y  -x  )  +  e/os(x  )h  (y  )  (10) 


in 

h  arbitrary  m-dimensional  density) 


The  enlarged  class  F  (e)  in  (10)  is  equivalent  to  considering  outliers  affecting  the 
observation  process  directly,  not  via  the  additive  noise  process,  as  is  the  case  with  the 
class  in  (8).  However,  the  minimax  value  of  the  game  for  the  class  in  (8)  is  the  same  as 
the  minimax  value  for  the  class  in  (10).  Furthermore,  a  saddle  point  solution  of  the  game 
(9)  always  exists  within  the  class  F  (e)  in  (10).  From  now  on  we  consider  only  the  class 
Fm(e)  as  defined  in  (10),  and  we  seek  the  saddle  point  solution  of  the  game  in  this  class. 


From  the  results  in  Papantoni-Kazakos  (1984a)  we  conclude  that  the  saddle  point  of 


I 


the  game  can  be  found  by  solving 


sup  infe(f,X0) 
fef  ”(£),  x0 


(11) 


The  expression  infe(f.XQ)  represents  the  minimum  mean  square  error  at  the  density  /  and 

*o 

can  be  written  as 


ao"I0f) 

2  2 

w  here,  a  =  E(X„)  is  the  fixed  variance  of  X„,  and 

o  0  0 

1(0  =  E{E2{X(/Y/,f}/f}  = 

r  /  2 

( J  x/  (xQ,y  )dx0) 

R1  , 

=  J  -  dy  (12) 

R'  Jf(x0,y)dx  o 

R1 

Considering  the  form  of  /(xQ, y)  in  terms  of  the  nominal  and  contaminating 

densities,  as  derived  from  (10),  and  the  zero  mean  assumption  of  the  nominal  densities, 

the  quantity  !(/)  can  be  written  as  a  functional  of  the  /-dimensional  restriction  of  the 

/  / 

density  of  the  observation  sequence  y  .  Let  us  denote  the  latter  density  by  fy(y  ).  After 


some  aluebra,  we  obtain 


u niTVTV^jr.1  vjif,  yy  ,w^ji  "*  *j*ji  ■■■■■■i  ■«pi»wvn,n  ■u^u^'vwwvnimini 


((l-ejf^yVV))2 

IC/ )  =  K/  Y)  =  f  dy  (13) 

R'  fy^) 

where,  foY(y  )  is  the  nominal  density  of  y‘  at  the  vector  point  y  ,  given  by  the 
convolution  of  the  information  density  f^  and  nominal  noise  density  f‘  and  the  inner 

T  i  i 

product  P  y  is  the  optimal  linear  estimator  of  XQ  from  y  under  nominal  conditions  (i.e. 

for  £=0).  The  density  fy(y/ )  belongs  to  the  class  Fy  (e),  obtained  from  Fm(e)  as  follows 

F y (e)  =  {/  Y(y/ ) : f Y(y  )  =  ( 1  -e)f Jy  )*f‘oU(y  )  +  eh  (y  )} 

Problem  (11)  can  now  be  reduced  to 


inf  I (fY) 

fy£FY(e) 


(14) 


Although  the  class  of  densities  FY(£)  is  not  tight  (therefore  not  compact)  in  the 

weak  topology  of  all  probability  measures  on  the  Borel  o-field  of  the  metric  space 
/  .  .  ...  i 

(R  ,yL),  the  infimum  in  (14)  is  attained  in  Fy(e).  Furthermore,  there  is  a  unique  member 
of  Fy(e)  attaining  that  infimum,  under  the  nominal  assumptions  discussed  before.  The 
above  assertions  together  with  the  explicit  form  of  the  infimum  and  the  corresponding 
estimator,  constitute  the  statements  of  Theorem  1  below,  whose  proof  is  in  the  Appendix. 


Let  <{>(x)  and  O(x)  be  the  zero  mean  unit  variance  Gaussian  density  and  cumulative 
distribution,  respectively.  Let  H(X,z) ,  X>0  be  the  Huber  function  defined  as 


H(K,z)  =  max(-X,min(\,z)) 


(15) 


T  J 


Finally,  let  r  be  the  nominal  variance  of  the  linear  form  P  Y  ,  i.e 


T  /  :  ,  T  /  2  /  / 

r  =  E { (P  Y  )  }  =  J  (P  y  )  /oY(y  )dy  .  Then,  we  express  Theorem  1  as  follow's. 

/ 

R 

Theorem  1 

* 

(i)  There  is  a  unique  saddlepoint  solution  (f*,  XQ)  of  the  game  (9). 

*  „  * 

(ii)  The  saddlepoint  observation  density  fy  and  estimator  XQ  are  given  by  the  equations 


We  note  that  the  estimator  XQ  in  (17)  above,  is  a  truncated  version  of  the  linear, 

T  1 

nominally  optimal  mean  square  estimator  P  y  .  The  truncation  constant  A  is  proportional 
to  the  square  root  of  the  quantity  r  which  is  the  variance  gain  in  estimating  X0  from  y 
under  nominal  conditions  (e=0).  The  proportionality  factor  c  tends  to  infinity  for  e— >0. 
In  the  latter  case,  the  estimator  (17)  becomes  identical  to  the  nominally  optimal  mean 


square  estimator. 


There  are  interesting  similarities  and  differences  between  the  estimator  in  (17) 


and  the  classical  robust  parameter  estimator  of  Huber  (1964).  Both  estimators  introduce 


the  same  form  of  nonlinearity  to  limit  the  influence  of  bad  observations.  However,  while 


Huber’s  estimator  applies  the  nonlinearity  on  each  one  of  the  observation  data,  the 


estimator  derived  here  applies  a  similar  nonlinearity  on  a  linear  combination  of  the 


observation  data.  Furthermore,  the  form  of  the  least  favorable  density  derived  in  (16)  has 


heavier  tails  than  the  Gaussian  by  the  linear  factor  I P  y  I ,  while,  in  the  robust  parameter 


estimation  problem,  the  corresponding  least  favorable  density  has  much  heavier 


exponential  tails.  Regarding  these  comparisons,  it  should  be  pointed  out  that  Huber’s 


result  is  based  on  the  maximum  likelihood  esitmation  of  the  unknown  mean  of  a 


contaminated  distribution,  while  the  result  of  Theorem  1  is  based  on  a  Bayesian 


estimation  of  a  random  process  corrupted  by  contaminated  noise  and  with  the  mean 


square  as  performance  criterion. 


Regarding  qualitative  robustness,  we  note  that  for  any  e,  0<e<l,  the  estimator  XQ*  is 


both  continuous  and  bounded  satisfying  thus  the  conditions  for  outlier  resistance  stated  in 


the  previous  section  for  finite  length  estimators. 


The  mean  square  error  induced  by  XQ  at  the  least  favorable  density  /  y  is  equal  to 


2  *  m 

a  -I(/Y).  This  is  the  largest  possible  error  within  the  class  F  (e)  and  by  substitution 


we  obtain 


e(/  y  ,X  *)  =  ao2tl-(l-e)(2cD(c)-l)q2l 


- 1  [—  -  * 

where,  q  =  m  Let  e(/o,XQ)  be  the  mean  square  error  induced  by  the  robust  estimator 


X0  at  the  nominal  Gaussian  density.  Also,  let  e  be  the  nominally  optimal  mean  square 


error.  Then,  after  some  computations  we  obtain 


o  2  2 

e  =o0(l-q  ) 

e(/o,X0)  =  e°-2r(0(-c)(l+c2)-c(t)(c)) 

The  second  term  of  t  ie  right  hand  side  of  the  above  equation  is  always  positive  and 

* 

represents  the  performance  loss  that  is  incurred  if  the  robust  nonlinear  estimator  XQ  is 
applied,  instead  of  the  linear  nominally  optimal  one. 

4.  Construction  of  Filtering  and  Smoothing  Operations  -  Step  2 

We  now  consider  the  case  when  the  number  of  observation  data  is  larger  than  the 
parameter  m.  For  this  case  and  for  arbitrary  nominal  information  and  noise  processes, 
results  concerning  the  design  and  study  of  appropriate  nonlinear  filtering  and  smoothing 
operations  can  be  found  in  Tsaknakis  (1986).  For  the  purpose  of  this  paper  we  will  focus 
on  autoregressive  Gaussian  information  processes  and  white  Gaussian  nominal  noise 
processes. 

Let  the  nominal  information  and  observation  processes  {XJ,  {Y  }  be  given  by  the 
equations 


X 

n 


=  a.X 

1  n- 


+  a7X 

2  n- 


+ 


\Xn- 


k  n 


Y 

n 


+  w 

n 


(19) 


where,  { V"n ) ,  (Wn)  are  mutually  independent,  i.i.d.  and  zero  mean  Gaussian,  with 
2  2 

variances  c  and  c  respectively.  Upon  defining 

x  w 


the  nominal  model  can  be  described  in  the  following  vector  form 


U  =AU  ,+BV 

— n  — n-1  n 


Y  =BTU  +W 

n  — n  n 


(21) 


Writing  the  system  (19)  in  the  vector  form  (21)  has  the  advantage  of  the  recursive  Kalman 
filtering  relationships  for  the  nominal  model.  We  want  to  estimate  xQ  given  observations 
yQ,  y_j,  •  •  •  y_/+1  for  any  value  of  l ,  when  the  observation  process  is  corrupted  by 
outliers  occurring  in  batches  of  size  m.  When  /<m,  we  apply  the  minimax  estimator 
derived  in  the  previous  section.  When  /  >m,  we  consider  estimating  the  entire  vector 
given  the  above  measurements,  and  we  define  the  following  recursive  estimator 


„  .  rn„ 

u.  ,  =  A  u  .  +  g 

—■ -mj  Em 


*  ,  *  m+iA  . 

y  b.,(y.-B  A  u  ,) 

—i/ i  — -m,/ 


i=-m+l 


(22) 


In  (22),  (L,,u  ,,  denote  the  estimates  of  the  vectors  m,  u  given  observation 

data  (y0.y_r  •••  y_,+1)>(y_m-y_m_i>  •••  y_m_/+I)«  respectively.  Also, 
{b.r  i=0,...,-/+l }  are  the  vector-valued  coefficients  of  the  linear  m-step  recursion  of  the 
Kalman  filter  operation  on  the  system  (21).  Finally,  the  vector-valued  function  is 
defined  as  follows. 


13 


(23) 


gI*)  =  |H(Xn  ,x  ),  H(X  ,x,),  •  ■  •  ,H(X  , x  ) ] 

2m-1 —  1  0,m  1  -l.m  2  -k  +  l.m  k  1 


X  =  |xr  •  •  •  xkl 


where 


X  =  c|r  ] 

-j,m  1  -j.m' 


c:d>(c)  +  c  0(c)  = 


2(1 -£) 


r_.m  :  variance  gain  in  estimating  x ^  given  {y^  -m+l<i<0} 
nder  nominal  conditions. 


H(  • ,  • ) :  the  Huber  function  as  defined  in  (15). 


From  (22),  and  in  view  of  the  deflations  (23)  and  (24),  it  is  evident  that  every  scalar 
nonlinearity  is  applied  to  linear  combinations  of  at  most  m  observation  data. 
Furthermore,  if  e— >0,  the  positive  constants  {X^  m,  j=0,....,-k+l )  tend  to  infinity  and  the 
estimator  in  (22)  becomes  identical  to  the  optimal  at  the  nominal  Gaussian  model 
estimator.  For  e>0,  the  above  constants  are  all  finite  and  they  determine  the  amount  of 
limiting  which  is  introduced  in  each  entry  of  the  innovations  term  of  the  Kalman  filter. 

A  filter  similar  to  (22)  was  earlier  considered  by  Masreliez  and  Martin  (1977),  for 
the  case  m=l.  The  above  authors  applied  the  nonlinearity  on  a  transformed  version  of 
the  innovations  process.  However,  their  analysis  was  based  on  an  ad  hoc  assumption  that 
the  process  formed  by  the  residuals  is  Gaussian.  Then,  using  this  assumption,  they 
derived  a  covariance  recursion,  avoiding  thus  the  problem  of  nested  nonlinearities  in  the 
actual  nonlinear  recursion.  Later  on,  we  will  numerically  demonstrate  the  performance 


4 


14 


of  the  above  filter  as  compared  with  that  in  (22),  as  analyzed  by  the  methods  we  present 
in  the  sequel. 

Here,  we  are  primarily  interested  in  the  study  of  the  asymptotic  properties  of  the 
estimator  in  (22)  when  the  number  of  observations  tends  to  infinity,  and  the  nominal 
information  process  is  stationary.  The  condition  for  stationarity  of  the  latter  process  is 
that  all  the  roots  of  the  polynomial  equation 

„ k  „  k-l 

x  -a^  -  ..-^=0 

have  magnitudes  less  than  one. 

The  first  issue  is  the  asymptotic  outlier  resistance  of  the  estimator.  Theorem  2 
below,  whose  proof  is  in  the  Appendix,  establishes  that  property. 

Theorem  2 

Let  { Xn )  in  (19)  have  finite  variance  and  be  stationary.  Then,  the  filter  in  (22)  is 
asymptotically  (/  — >°°)  outlier  resistant  for  mutually  independent  m-size  batches  of 
outliers. 

The  next  issue  is  the  asymptotic  stationarity  of  the  filter  itself  when  /  — In  order 
to  study  that  we  consider  the  residual  process 


fc: 


&■ 


It 


i 


fe 

ji< 

p 


K. 


*• 


•  <* 


fe 


t  < 


-  2 

ijii 


T  m+i  T  m+i 

£  b  .  (Y  -  B  A  U  )  +  B  A  (U  -II  . 

— 1/  1  — -m,/  —  — -m  — -m,/ 


i=-rrn-l 


For  /  going  to  infinity  along  multiples  of  m,  the  above  residual  process  becomes 
asymptotically  stationary.  This  will  be  shown  by  establishing  a  more  general  result 
regarding  the  asymptotic  stationarity  of  Markov  processes  with  Euclidean  state  space. 
The  latter  result  is  expressed  in  Theorem  3  below,  whose  proof  is  in  the  Appendix.  In  the 


A  x 

sequel  we  denote  I  lx  I  I  max  lx.  I  for  x  =  (xiv.xn)  . 

1 


Theorem  3 


k  /  k 

Let  f(x,v)  :  R  x  R  — »  R  be  measureable.  Let  {X  ,  n>0)  be  a  stochastic  process  in 


R  defined  by 


X  ,  =  f(X  ,  V  ),  n>0 

— n+1  — n  — n 


(26) 


where  ( V^,  n>0)  is  an  i.i.d.  process  in  R  ,  independent  of  Xq  with  distribution  P(-). 
Then,  if  there  is  a  positive  £,  such  that  £<1  and 


J  I  I  f(x_,v)  -  f(x',v)  I  ldP(v)<l  lx-x'  I  l-^,Yx,x'eRk 


(27) 


the  process  { }  is  asymptotically  stationary. 


The  residual  process  (25)  satisfies  the  conditions  of  Theorem  3.  This  can  be  shown 
by  using  the  properties  of  the  nonlinearity  g  (•),  namely  that 
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V 


S' 


I  Ig^x)  -  I  I -I  lx-x'l  I,  and  certain  standard  properties  of  the  linear  filtering 

coefficients  { h^  }  and  the  stationary'  matrix  A.  As  a  result,  the  marginal  probability 
density  of  the  residuals  converges  weakly  to  a  steady  state  density.  The  covariance  of 
that  steady  state  density  is  what  we  call  the  asymptotic  mean  square  error  induced  by  the 
filter  (22)  at  the  nominal  Gaussian  model.  In  fact,  it  is  even  true,  as  a  result  of  Theorem 
3.  that  the  sequence  of  covariances  of  the  residual  process  converges  to  the  steady  state 
covariance. 

The  computation  of  the  steady  state  covariance  is  an  important  component  in  the 
study  of  the  asymptotic  properties  of  the  proposed  filter.  It  is  interesting  to  point  out  that 
the  deviation  of  the  robust  filter  from  the  nominally  optimal  linear  filter  builds  up  as  the 
number  /  of  observations  increases,  and  we  would  like  to  see  what  is  the  performance  for 
asymptotically  large  number  of  observations,  as  compared  to  the  nominally  optimal 
asymptotic  performance.  The  difference  in  performance  will  clearly  exhibit  the  price 
that  one  has  to  pay  for  achieving  robustness  in  this  context. 

Due  to  the  nature  of  the  nonlinear  residual  recursion,  the  computation  of  the 

asymptotic  covariance  is  a  difficult  and  tedious  task.  As  analytic,  or  closed  form, 

expressions  seem  impossible  to  obtain,  we  approached  the  problem  by  deriving  upper 

and  lower  bounds.  The  derivation  was  based  on  the  asymptotic  stationarity  of  the 

2  -  2 

residual  process,  which  implies  lim EffU^-U^)  )  =  /tm E{(U  -U  ),  and  the 

/  — -»o l  — >oo 

approximation  of  the  square  of  the  second  term  in  (25)  by  upper  and  lower  quadratic 
bounds  in  terms  of  U  -0  The  bounds  were  finally  obtained  bv  solving  two  fixed 

m  m  T 

point  matrix  equations  of  the  form  X  =  A  X(A  )  +  G(X).  The  two  bounds  are  found  to 
be  tight  enough  and  approaching  each  other  as  the  design  parameter  m  becomes  larger,  at 
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the  exponential  rate  IP-max(A)l  ,  where  Bmax(A)  is  the  largest  magnitude  eigenvalue  of 
the  matrix  A.  As  a  result,  a  reasonably  good  estimate  of  the  asymptotic  covariance  was 
obtained.  We  defer  the  discussion  of  this  issue  until  section  6  where  the  above  results  are 
numerically  demonstrated  and  analyzed. 

5.  Breakdown  Point.  Influence  Function. 

Let  us  consider  the  frequently  observed  in  practice  case  of  independent  and  additive 

outliers.  In  particular,  let  the  noise  sequence  {...,W  ,W0,Wj,  •  •  •  }  be  such  that  each  of 

its  elements  is  generated  by  the  nominal  Gaussian  noise  process,  with  probability  1-8, 

and  it  is  instead  equal  to  some  deterministic  value,  v,  with  probability  8,  0<8<1.  Let  the 

value  v  occur  with  probability  8,  independently  per  noise  datum.  Given  the  above  outlier 

model,  given  some  asymptotic  filtering  or  smoothing  operation,  XQ,  let  e(fo,8,v,XQ) 

denote  the  induced  mean  squared  error.  That  is,  if  f  represents  the  overall  nominal 

*  „  2 

Gaussian  model,  then,  e(fo,8,v,X0)  =  EffXp-X^  lfo„8,v).  Let  us  denote, 

„  A 

e(fo,8,X0)  lim  e(fo,S,v,X0),  and  let  there  exist  some  value  8*,  0<8*<1,  such  that, 

V — 

e(fo,8,X0)>E(X02|fo}  ;  V8  >  8* 
e(fo,5,X0)<E{X02|fo)  ;  V5<5* 

Then,  the  value  8*  is  called  the  breakdown  point  of  the  asymptotic  operation  XQ.  The 
breakdown  point  clearly  represents  the  maximum  frequency  of  independent, 
asymptotically  large  in  amplitude  outliers  that  the  operation  XQ  can  tolerate,  before  it 
becomes  worthless;  that  is,  before  it  starts  inducing  mean  squared  error,  that  is  larger 
than  that  induced  when  no  observation  data  are  available.  We  note  that  the  breakdown 


c 


i 


\ 


i 


points  of  the  nominally  optimal  linear  filtering  and  smoothing  operations,  are  easily 
found  to  equal  zero. 

Let  us  now  consider  a  generalization  of  the  outlier  model  presented  above.  In 
particular,  let  us  consider  the  case  where  independent,  size  m  blocks  of  outliers  may 
occur.  Then,  each  block  occurs  with  probability  8,  and  it  consists  of  a  value  v  per  datum 
in  the  block.  Given  some  filtering  or  smoothing  operation  X  ,  we  then  denote  the 
induced  mean  squared  error,  em(fo,8,v,Xo).  Denoting  by  e(fQ,Xo)  the  mean  squared  error 

A 

in  the  absence  of  the  above  outlier  model,  we  denote,  J  Tv)  e  (f  ,5,v,XJ  -  e(f  ,XJ. 
We  call  J  ,(v)  the  variation  function  at  8.  Given  8.  the  variation  function  exhibits  the 

m,/ - - 

difference  between  the  mear  squared  error,  when  the  outlier  value  is  v  and  the  frequency 

of  the  outlier  blocks  is  8,  and  the  mean  squared  error  in  the  absence  of  outliers.  We  call 
A 

I  Tv)  8  J  Tv),  the  normalized  variation  function  at  5,  and  we  call 

m,o  _  m,o  -  -  -  — 

A 

I  (v)  liml  „(v)  the  influence  function.  The  influence  function  is  the  slope  of  the 

m  _  m,o  -  r 

-  5-*0 

variation  function  at  8=0,  and  it  exhibits  the  effect  of  the  outlier  value  v,  at 
asymptotically  small  outlier  frequencies  8. 

Regarding  the  computation  of  the  breakdown  point  and  the  influence  function  Im(v) 
of  the  filtering  operation  in  (22),  an  approach  similar  to  that  used  for  the  asymptotic 
variance  was  adopted.  In  particular,  upper  and  lower  bounds,  were  computed  for  both 

2m 

quantities.  These  bounds  approach  each  other  at  the  same  exponential  rate  I  4max(A)  I 

The  influence  function,  I°(v),  of  the  nominally  optimal  linear  filter,  was  also 
computed  for  comparison.  The  latter  has  a  closed  form  expression  which  is  a  quadratic 
function  of  the  outlier  value  v. 


K 


19 


Im(v)=  Iia-C)A  1  [v  M.  M.  -OwN]l(A  )  (I-C)  ) 


where, 


C  =  V  b.B  A,  (b.  =  /imb.) 

^  1  1  U 


I  b, 


i=-m+l 


N=  V  bb. 

— l — l 


i=-m+l 


6.  Numerical  Results 


In  this  section  w-e  present  some  numerical  results  regarding  the  asymptotic 
performance  of  the  filtering  operation  in  (22),  for  two  special  cases  of  the  nominal  model 


Model  1  First  order  autoregressive  with  autoregressive  parameter  a  =  0.5,  and 


2  2 

<7  =  a  =1. 

X  w 


L  2. 

Model  2  Third  order  autoregressive  with  a.  =  0.6.  a,  =  0.07.  a,  =  -0.06  and  a  =a  =1 

1  ^  3  x  w 

Tables  1,  2,  and  3,  and  Figures  1  and  2  exhibit  the  performance  of  the  filtering 
operation  in  (22),  for  various  values  of  the  design  parameters  e  and  m,  when  the  nominal 
model  is  model  1.  When  the  nominal  model  is  instead  model  2,  the  corresponding 
performance  is  exhibited  in  Tables  4,  5,  and  6  and  figure  3.  Tables  2  and  5  correspond  to 


independent  per  datum  outliers,  while  tables  3  and  6  correspond  to  independent  m-size 
batches  of  outliers. 

Both  the  upper  and  the  lower  bounds  of  the  asymptotic  at  the  nominal  mean  squared 
error  (Tables  1  and  4)  are  monotonically  increasing  when  the  contamination  parameter  e 
increases,  for  any  fixed  m.  Moreoever,  for  fixed  e,  particularly  for  small  values  of  e,  the 
upper  bounds  of  the  asymptotic  error  decrease  sharply  when  m  increases,  while  the 
corresponding  lower  bounds  experience  relatively  smaller  variations  with  m.  Regarding 
the  breakdown  point  (Tables  2,  3,  5,  and  6),  we  first  observe  that  both  upper  and  lower 
bounds  increase  when  e  increases,  for  any  fixed  m.  For  the  case  of  independent  per 
datum  outliers,  the  upper  and  lower  bounds  of  the  breakdown  point  decrease  when  m 
increases.  On  the  contrary,  when  independent  m-size  batches  of  outliers  are  acting,  the 
lower  bounds  of  the  corresponding  breakdown  point  increase  with  m,  while  the  upper 
bounds  remain  practically  constant.  Finally,  the  upper  and  lower  bounds  of  the  influence 
function  of  the  filtering  operation  in  (22)  are  always  monotonically  increasing  and 
bounded,  as  can  be  seen  from  Figures  1,  2,  and  3.  They  both  reach  certain  saturation 
points  depending  on  e  and  m,  and,  for  fixed  m,  these  saturation  points  are  decreasing 
when  e  increases.  In  all  the  above  cases  and  for  all  values  of  e,  the  upper  and  lower 
bounds  tend  to  become  equal  for  large  m,  permitting  thus  a  more  accurate  evaluation  of 
the  performance  measures  of  the  filtering  operation  in  (22). 

The  filtering  operation  in  (22)  can  combine  close  to  optimal  at  the  nominal  model 
performance,  together  with  good  protection  against  outliers.  In  addition,  this  operation  is 
more  appropriate  for  protection  against  independent  batches  of  outliers.  Similar  results 
are  draw-n  when  the  order  of  the  nominal  autoregressive  model  in  (20)  is  some  arbitrary- 


integer  k. 


Using  the  concepts  and  methods  that  we  developed  in  previous  sections,  we 
analyzed  the  asymptotic  performance  of  the  filter  proposed  by  Masreliez  and  Martin 
when  it  operates  on  a  stationary'  environment.  In  Tables  7  and  8,  the  asymptotic  mean 
square  error  bounds  and  the  breakdown  point  bounds  of  the  latter  filter  are  shown 
(column  B)  versus  the  corresponding  bounds  for  the  filter  in  (22)  presented  here.  In 
Figures  4  and  5  the  same  comparison  is  made  for  the  influence  functions  of  the  two 
filters.  Both  filters  were  assumed  to  operate  on  the  same  process  which  was  taken  here  to 
be  model  1,  and  for  m=l.  It  is  observed  that  the  mean  square  error  bounds  of  the  filter 
(22)  are  uniformly  better  than  those  of  the  Masreliez  and  Martin  filter  (Table  7),  at  the 
expense  of  lower  breakdown  points  (Table  8)  and  higher  saturation  points  of  the 
influence  functions.  However,  for  m=2,  it  can  be  clearly  seen  from  Tables  9  and  10,  that 
the  breakdown  points  of  the  filter  (22)  improve  considerably  while  the  mean  square  error 
remains  small,  especially  for  low  contamination  levels. 

7.  Conclusions 

We  designed  and  analyzed  nonlinear  filtering  and  smoothing  operations  that  were 
found  to  provide  effective  resistance  to  outliers  and  simultaneously  good  performance  at 
the  nominal  Gaussian  model.  The  proposed  estimators  can  be  easily  implemented,  being 
only  slightly  more  complex  (in  implementation)  than  the  usual  linear  estimators. 
However,  the  analysis  and  evaluation  of  their  asymptotic  performance  were  considerably 
more  involved  than  that  for  linear  estimators,  both  from  a  theoretical  and  a  computational 


point  of  view. 


»Ull|i  IWJ  tfL1  U1.  IF  1  ij— 


Due  to  the  nonlinear  recursion  which  is  involved  in  (22),  an  exact  covariance 
recursion  is  not  possible.  So,  it  was  necessary  to  study  the  entire  functional  recursion  of 
probability  distributions.  Then,  we  proved  asymptotic  stationarity  of  the  residual  process 
by  establishing  a  more  general  result  concerning  the  asymptotic  stationarity  of  Markov 
processes  with  Euclidean  state  space. 

For  the  proposed  estimators,  strong  robustness  and  good  performance  at  the  nominal 
are  conflicting  requirements.  The  more  robust  an  estimator  is,  the  worse  performance  it 
has,  and  vice  versa.  The  tradeoff  between  robustness  and  performance  has  to  be  adjusted 
for  each  particular  problem  by  appropriately  varying  the  design  parameters  £  and  m, 
according  to  the  specific  requirements  and  the  available  knowledge  about  the  underlying 
situation. 
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APPENDIX 


Proof  of  Theorem  1 


We  first  prove  that  if  the  optimization  problem 


inf  I(fy) 
fy£F  y(£) 

i 

has  a  solution,  it  is  unique.  Indeed,  let  /;/,  be  two  /-dimensional  densities  in  Fy(e) 
attaining  the  infimum.  Then,  since  I(-)  is  convex,  any  density  /.  of  the  form 


/8  =  (1-S)fj  +5f2,  0<5<1 

must  attain  the  same  infimum.  Thus,  I(f§)  is  constant  for  0<8<1.  It  is  implied  that 
d"l(f5)  (( 1  -e)f>Y(y/ )(PTy/ ))2(f2(y/  )-fj  (y^  ))2 

°= — ~  =  2\ - ~ - dy  (A.D 

d5  R/  (f8(y  )) 

where,  the  differentiation  under  the  integral  sign  is  justified  by  the  dominated 

convergence  theorem  (observe  that  fg>(l-£)foy>0).  From  (A.l)  we  conclude  that  fj=f2 

l  T  t  / 

a.e.  (dy  ),  since  P;*0  and  the  set  where  P  y  =  0  is  a  proper  subspace  of  R  . 

We  now  prove 


j,  we  have  the  following  relationships. 


I(fY)  -  I(fY)  = 


[(l-e)pVfoY(y  )]2(fY(y)  -  (l-e)foY(y  )) 


fyCyVyCyS 


dy  + 


[(l-e)pY  foY(y  )]2(fY(y  )  -  f*(y  )) 


fY(y/)fY(y/) 


dy  < 


C(y‘) 


XA  j  (fY(y/ )  -  fY(y/  ))dy^  = 

r1  My> 


*  i  o 

r  (fY(y ))  • 

=  X2  1-  f  dy'  <  0 

J  *  l 

r'  My )  j 


The  inequality  in  (A.2)  follows  from  the  above  relationships. 

The  expressions  in  (18),  determining  the  value  of  the  constant  X,  evolve  from  the 
requirement  that  j  fy(y  )dy  =  1 . 


-  *  / 

Finally,  the  form  of  the  robust  estimator  Xg(y  )  is  equal  to  the  conditional 
expectation  E(X(J/yl)  at  the  least  favorable  density  f*(xQ,y/). 


f*(x  y  ) 

*0(y)=K  ;  ;  dxc 

fY(y ) 


(A. 3) 


where, 


f*(x0,y  )  =  (l-e)fo(xQ,y  )  4-  efos(x0)h*(y  )  (A.4) 

Substituting  (A.4)  into  (A. 3)  and  recalling  that  J  x0fos(x0)dx0  =  0  and 

r' 

f  i  i  T  / 

J  f0(xQ,y  )dxQ  =  foY(y  )P  y  we  obtain 

l 

R 


(l-e)PTy/foY(y/) 


*  i 

fY(y ) 


f  PTy‘,foT\pryl)\<X 
lA.sgn(p'V),  for  IPTy/  I  >X 


=  H(X,pV), 


Proof  of  Theorem  2 


a  _  a  rn  _  a  * 

The  operation  in  (22)  has  the  general  form,  Xn  =  £a.X.  +  g(y  ,  2Ioc.X.,  { } ),  where, 


for  some  bounded,  X, 


g(x)  = 


x  ;  I  x  I  <X 

,  and  where  I  £a.  I  <c,  for  some  given  c>0  .  Therefore, 
Asgnx:  I  x  I  >A,  1 


!  Xn  I  <  X[\+ 1  £a.  1 1  <  Mc+1);  Vn. 
i 


(A. 5) 


A  /.  A. 

Let  E  {[X  -X  j  )  denote  the  mean  squared  error  induced  by  the  estimate  X  ,  when 

\xo  n  nJ  n  J  n 


the  Gaussian  nominal  observation  process  is  acting. 


Let  E  ([X  -X  ]  )  be  the  same 

m  * 1  n  nJ  1 
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c 


error,  when  some  process  in  class  F™  is  acting  instead.  Let  y  and  z  denote  sequences 

n  .  n 

that  are  respectively  generated  by  the  processes  |lQ  and  [i .  Given  some  set  A  in  R  ,  and 
in  connection  with  (A. 5)  and  the  Schwartz  inequality,  we  have, 


' 


«  ' 


E  {[X -X  ]2lzneAn)  =E  {X2|zneAn} -2E  (XX  lzneAn}  + 

n  nJ  ’  n  ’  |ll  n  n 

+  E  {[X  ]2|zneAn}  <  (A. 6) 

\x  nJ  v 

<  c+2E1/2{X2  lzneAn}E1/2([Xn]2 1  zneAn}  +  E^[XJ2 1  zneAn} 

1/2  2  2  1/2  2  A 

<c+2Xc  (c+l)+A.  (c+1)  =  [c  +\(c+l)]  C 

Due  to  (A. 6),  and  considering  ergodic  and  stationary  observation  processes  in 
conjunction  with  fm,  we  obtain:  given  "n>0,  there  exists  nQ,  such  that, 

Vn>no;E^{[Xn-Xnl2}<(l-e+T1)E{[Xn-Xn]2 1 [#i:Ym(z*m,  y^)>el£ne,ynERn}+EC 

(A.7) 

where,  for  independent  m-size  outliers,  there  exists  some  £o>0,  such  that, 

E{[Xn-XJ2 1  [#i:Ym(z™,  y.1++lm)>e]<ne,yneRn}<  (A. 8) 

<E  { [X  -X  ]2}  +  eC;  Ve<e  ,  Vn>n 

H0‘l  n  nJ  ’  o  o 

e 

From  (A.7)  and  (A. 8)  we  conclude:  Given  r\  =  ~ ,  there  exist  no  and  E>0,  such  that, 

2 
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le  [X -X  ]  J-E  {[X  -X  ]  }  I<e(2-— )C+  -  E  {[X-XI)< 

li11  n  n1  ’  u,1'  n  nJ  '  v  '  lx  1 1  n  n*  ' 

2  2 


5  A 

<  e  C  5;  V  n>n  ,¥  e<e 

_  o  o 

2  “ 


Thus,  given  S>0,  there  exist,  n  ,  and  E:0<e<min(e  ,  ),  such  that 

5  C 


n  (|i  ,(i)  <  e  implies  IE  {[X  -X  ]2}-E  ([X  -X  ]2}  I  <5  ;  Yn>n 

n,pn  ~o  ^  r  |x‘l  n  nJ  ’  polL  n  nJ  ’  ’  o 


The  proof  of  theorem  is  now  complete. 


Proof  of  Theorem  3 


From  (26)  we  conclude  that  {XJ  is  a  Markov  process.  Thus,  to  prove  asymptotic 
stationarity,  it  suffices  to  show  that,  given  any  distribution  for  X  ,  the  distribution  of  X 


converges  weakly  to  a  unique  distribution  in  R  ,  as  n->°°. 

k 

Let  |i0(x)  be  an  arbitrary  density  function,  VxeR  .  Let  then  the  sequence 


(H  (x),  n>0}  be  defined  as  follows. 


(A.9) 


;  where  A(x,co)  denotes  the  conditional  density  function  of  x,  given  co,  when  x  =  f(co,v), 
and  where  (0  is  independent  of  v,  and  p(v)  is  the  density  function  of  v  at  veR  .  Let  us 
now  define  the  sequence  {A  n>l },  as  follows. 


A  (x,co)  =  A(x,co) 


(A.  10) 


I 


Then,  we  can  write, 


A  (x,0))=|A  (x,  z)A  (z,co)dz 


Hn(x)  =  /  A(n)(x.co)|i0(to)dw 


(A.  1 1 ) 


To  show  weak  convergence  of  the  sequence  {|!n(.x)},  w'e  need  to  show  that  there 
exists  a  density  function  ji(x)  ;  xeR  ,  such  that,  for  any  continuous  and  bounded  function, 
g(x)  in  R  ,  we  have, 


|  g(>L)ftn(x)dx - »  Jg(x)|i(x)dx 


n— >«»  k. 

R 


(A. 12) 


Let  us  define  the  sequence  (gn(x),  n>0},  as  follows. 


g„(x)  =  g(x) 


(A. 13) 


Then, 


gn(x)  =  J  A(z,x)gn_](^)dz 


J  g(x)Hn(x)dx  =  Jgn(x)it0(x)dx 


(A. 14) 


Let  us  define. 


un  sup  (5  sup  I  gn(z)-gn(co)  I ) 
6>()  I  I  (i w  I  I  <5 


(A.  15) 


Without  lack  in  generality,  we  will  assume  that  the  quantities  { u  }  are  all  finite. 
(This  is  tme  if,  for  example,  the  functions  gnfx)  satisfy  a  Lipshitz  condition.)  From  (26) 
and  (A.  13)  we  obtain. 


gn(x)  =  Jgn_1(f(x,v))p(v)dv 

k 

R 

From  (A. 15)  and  (A.  16)  we  conclude. 


(A. 16) 


un  <  J  { sup(5  sup  gn_1(f(x,y))-gn_](f(w,v))l))p(v)dy 

k  8>0  I  t  x— co  I  l  <8 

R 


<  Jh(v)p(v){  sup  ([8h(v)|  sup  lgn_|(x)-gn_,(w)l))dv 

k  8>0  I  I  x-co  I  I  <8h(v) 


=  un_Jh(y)p(y)dy  =  CUn_1,C<1  (A  1?) 

Rk 

From  (A.  17),  we  conclude  that  un  — >  0,  as  n— >°°,  and  that  gn(x)— >g(x)  =  constant  on  R  , 
as  n— Thus, 


J  g(x)|in(x)dx)  =  J  gn(x)H0(*)dx - -^constant  } 

k  k  n — 

R  R 

Due  to  (27),  the  sequence  In  (x)}  is  tight.  Thus,  there  exists  a  subsequence 
(Fnlx)},  and  a  density  function  ji(x)  in  R  ,  such  that,  for  every'  continuous  and  bounded 

function  g(xj,  we  have. 
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ra 
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2 

3 

4 

5 
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0. 53167 
0.66941 

0.53284 

0.56629 
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0.54159 

0.53346 

0.53552 

0.53350 

0.53401 

0.53351 

0.53364 

0.01 

0.53488 

0.67108 

0.53963 

0.57247 

0.54136 

0.54945 

0.54183 

0.54385 

0.54195 

0.54246 

0.54198 

0.54211 

0.1 

0.58157 

0.70620 

0.608293 

0.63797 

0.61640 

0.62328 

0.01851 

0.62032 

0.61904 

0.61949 

0.61917 

0.61929 

0.15 

0.60983 

0.72961 

0.64401 

0.67249 

0.65401 

0.66099 

0.65659 

0.65832 

0.65723 

0.65767 

0.65740 

0.65750 

0.25 

0 . 66941 
0.78026 

0.71376 

0.73998 

0.72608 

0.73249 

0.72921 

0.73080 

0.72999 

0.73039 

0.73019 

0.73028 

0.3 

0.70079 

0.80718 

0.74848 

0.77357 

0.76146 

0.76758 

0.76474 

0.76626 

0.76556 

0.76594 

0.76576 

0.76586 

0.4 

0.76727 

0.86426 

0.81887 

0.84156 

0.83243 

0.83795 

0.83582 

0.83719 

0.83667 

0.83701 

0.83688 

0.83697 

Table  1 


Bounds  on  the  asymptotic  mean  squared  error,  at  the 
nominal  model. 

Model  1.  Causal  filtering  operation  in  (22). 

Asymptotic  mean  squared  error  induced  by  the  optimal  at  the 
nominal  model  causal  filter  -  0.53112 
Upper  lines:  lower  bounds. 


m 

1 

2 

3 

4 

5 

6 

0.002 

0.09928 

0.14352 

0.06814 

0.07476 

0.04932 

0.05048 

0.03788 

0.03811 

0.03056 

0.03060 

0.02556 

0.02557 

0.01 

0.14699 

0.20676 

0.10040 

0.10942 

0.07274 

0.07433 

0.05597 

0.05628 

0.04522 

0.04528 

0.03786 

0.03788 

0.1 

0.32204 

0.40878 

0.21602 

0.22978 

0.15715 

0.15974 

0.12180 

0.12228 

0.09898 

0.09908 

0.08326 

0.08328 

0.15 

0.38225 

0.47011 

0.25595 

0.27034 

0.18674 

0.18937 

0.14516 

0.14568 

0.11824 

0.11835 

0.09962 

0.09964 

0.25 

0.48129 

0.56488 

0.32349 

0.33815 

0.23761 

0.24036 

0.18576 

0.18631 

0.15194 

0.15205 

0.12840 

0.12842 

0.3 

0.52466 

0.60450 

0.35423 

0.36875 

0.26119 

0.26395 

0.20477 

0.20533 

0.16783 

0.16795 

0.14203 

0.14206 

0.4 

0.60417 

0.67478 

0.41329 

0.42723 

0.30739 

0.31012 

0.24246 

0.24301 

0.19955 

0.19967 

0.16937 

0.16940 

Table  2 

Bounds  on  the  breakdown  point. 

Model  1.  Causal  filtering  operation  in  (22).  Independent 
per  datum  outliers. 

Upper  lines:  lower  bounds. 


34 


ra 

1 

2 

3 

I 

4  j  5 

6 
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0.13164 
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0.14080 
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0.14394  '  0.14394 
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0.01 
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0.20274 
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0.20580  [  0.20656 
0.20682  1  0.20682 

0.20676 
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0.1 
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0.38537 
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0.40520 
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0.40618 
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0.15 
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0.47011 

0.44639 

0.46759 

0.46212 

0.46733 

0.46601 
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0.56195 
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0.56166 

0.56045 

0.56164 

0.56134 

0.56164 

0.56156 

0.56164 

0.3 

0.52466 

0.60450 

0.58298 

0.60152 

0.56672 

0.60124 

0.60010 

0.60121 

0.60093 

0.60121 

0.60114 

0.60121 

0.4 

0.60417 

0.67478 

0.65577 

0.67193 

0.66775 

0.67166 

0.67067 

0.67164 

0.67140 

0.67164 

0.67158 

0.67164 

Table  3 

Bounds  on  the  breakdown  point. 

Model  1.  Causal  filtering  operation  in  (-’2).  Independent 
size-m  batches  of  outliers. 

Upper  lines:  lower  bounds. 


m 

1 

2 

3 

4 

5 

6 

0.002 

0.55402 

0.83214 

0.57594 

0.68407 

0.59937 

0.63361 

0.61040 

0.62154 

0.61445 

0.61764 

0.61566 

0.61658 

0.01 

0.57548 

0.86214 

0.62504 

0.73994 

0.66518 

0.70200 

0.68180 

0.69383 

0.68763 

0.69109 

0.68936 

0.69035 

0.1 

0.62110 

0.89436 

0.69155 

0.79589 

0.72865 

0.76040 

0.74097 

0.75110 

0.74499 

0.74788 

0.74615 

0.74698 

0.15 

0.65204 

0.94013 

0.72942 

0.83110 

0.74120 

0.79568 

0.77011 

0.78320 

0.77401 

0.77516 

0.77432 

0.77501 

0.25 

0.69875 

0.95182 

0.73479 

0.86264 

0.76678 

0.80203 

0.78133 

0.79312 

0.79002 

0.79202 

0.79012 

0.79136 

0.3 

0. 73478 
0.96067 

0.73930 

0.91011 

0.78033 

0.86481 

0.79300 

0.82414 

0.80400 

0.80923 

0.80511 

0.80547 

0.4 

0.73510 

0.97033 

0.74902 

0.91437 

0.79087 

0.86690 

0.81142 

0.83571 

0.82267 

0.82610 

0.82320 

0.82359 

Table  4 

Bounds  on  the  asymptotic  mean  squared  error  at  the  nominal 
model . 

Model  2.  Causal  filtering  operation  in  (22).  Asymptotic  mean 
squared  error  induced  by  the  optimal  at  the  nominal  model 
causal  f'lter  =*  0.5A731. 

Upper  lines:  lower  bounds. 


35 


'S'sXs^ro^ 

1 

2 

3 

4 

5 

6 

0.002 

0.07594 

0.13890 

0.05802 

0.07501 

0.04513 

0.04960 

0.035395 

0.035980 

0.028765 

0.029010 

0.02411 

0.02486 

0.01 

0.11029 

0.20020 

0.08225 

0.11510 

0.06334 

0.08010 

0.04958 

0.05156 

0.04030 

0.0450 

0.03380 

0.03388 

0.1 

0.25689 

0.39537 

0.18640 

0.22540 

0.14353 

0.15003 

0.11313 

0.11804 

0.09248 

0.09424 

0.07790 

0.07823 

■ggi 

0.32899 

0.47563 

0.23552 

0.27100 

0.18083 

0.20242 

0.14286 

0.14811 

0.11705 

0.11829 

0.09878 

0.09890 

0.25 

0.47094 

0.60225 

0.33123 

0.39693 

0.25350 

0.26089 

0.20121 

0.20541 

0.16563 

0.16735 

0.12747 

0.12784 

0.3 

0.53838 

0.65802 

0.37811 

0'.42004 

0.28952 

0.29457 

0.23047 

0.23215 

0.19020 

0.19082 

0.4 

0.66166 

0.75106 

0.47002 

0.52401 

0.36191 

0.39102 

0.29019 

0.30016 

0.24090 

0.24210 

0.20548 

0.20602 

Table  5 


Bounds  on  the  breakdown  point . 

Model  2.  Causal  filtering  operation  in  (22).  Independent 
per  datum  outliers. 

Upper  lines:  lower  bounds. 


1 

2 

3 

4 

5 

6 

0.002 

0.07594 

0.13890 

0.11269 

0.13995 

0.12939 

0.14500. 

0.13424 

0.13952 

0.13578 

0.13595 

0.13622 

0.13682 

0.01 

0.11029 

0.20020 

0.15774 

0.19500 

0.17826 

0.19851 

0.18406 

0.18820 

0.18592 

0.18683 

0.18643 

0.18682 

0.1 

0.25689 

0.39537 

0.33813 

0.39220 

0.37173 

0.39104 

0.38136 

0.38804 

0.38444 

0.38740 

6.28530 

0.38607 

0.15 

0.32899 

0.47563 

0.41556 

0.47215 

0.45031 

0.46903 

0.46022 

0.46630 

0.46336 

0.46502 

0.46424 

0.46482 

0.25 

0.47094 

0.60225 

0.55275 

0.60112 

0.58401 

0.60039 

0.59287 

0.60004 

0.59561 

0.59970 

0.59820 

0.59918 

0.3 

0.53838 

0.65802 

0.61325 

0.65720 

0.64137 

0.65695 

0.64932 

0.65530 

0.65244 

0.65307 

0.4 

0.66166 

0.75106 

0.71912 

0.75083 

0.74020 

0.75010 

0.74616 

0.74970 

0.74795 

0.74912 

0.74845 

0.74887 

Table  6 


Bounds  on  the  breakdown  point. 

Model  2.  Causal  filtering  operation  in  (22).  Independent 
8ize-m  batches  of  outliers. 

Upper  lines:  lower  bounds. 
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; 

c 

A 

B 

0.002 

0.  53167 

0.  53283 

0.66941 

0. 66992 

0.01 

0.  53488 

0.  53934 

0.  67108 

0.  67378 

0.1 

0.  58157 

0.  60346 

0.  70620 

0.  72394 

0.15 

0. 60983 

0.  63695 

0. 72961 

0.75215 

0.25 

0.66941 

0. 70312 

0. 78026 

0.80876 

0.  3 

0. 70079 

0. 73546 

0.80718 

0.83740 

0.4 

0. 76727 

0.80487  | 

0.86426 

0.89603  j 

Table  7 


Comparison  of  asymptotic  mean  square  error  bounds 
between  filtering  operation  in  (22)  and  the  filter 
by  Masreliez  and  Martin.  Model  1.  Optimal  at  the 
nominal:  0.53112 

A:  Causal  filtering  operation  in  (22).  m=l 
B:  Filter  by  Masreliez  and  Martin 
Upper  lines:  lower  bounds. 
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f 

y 


V 


i 


4* 


1 

£ 

A 

B 

0.002 

[ _ 

0.09928 

0.  12240 

0. 14352 

0.17464 

0.01 

0.14699 

0.17853 

0.20676 

0.24633 

0.1 

0.  32204 

0. 36890 

0.  40878 

0.45648 

0.15 

0.  38225 

0.42999 

0.47011 

0. 51629 

0.25 

0.48129 

0. 52707 

i 

0. 56488 

0. 60631 

0. 52466 

0. 56851 

!  0. 3 

■ 

0.60450 

0. 64324 

0.4 

0. 60417 

0. 64312 

_ 

0.67478 

0. 70798 

Table  8 

Comparison  of  breakdown  point  bounds.  Model  1 
A:  Causal  filtering  operation  in  (22).  m=l. 

B:  Filter  by  Masreliez  and  Martin. 

Upper  lines:  lower  bounds. 
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e 

A 

B 

0.002 

0. 53284 

0. 56629 

0. 53283 

0.66992 

0.01 

0. 53963 

0.57247 

0. 53934 

0. 67378 

0.1 

0.60829 

0.63797 

0. 60346 

0. 72394 

0.15 

0.64401 

0.67249 

0. 63695 

0.75215 

0.25 

0. 71376 

0.73998 

0. 70312 

0.80876 

0.3 

0. 74848 

0.77357 

0.73646 

0. 83740 

0.4 

0.81887 

0.84156 

0.80487 

0.89603 

Table  9 


Comparison  of  Asymptotic  mean  square  error 
bounds  between  filtering  operation  in  (22) 
and  the  filter  by  Masreliez  and  Martin. 
Model  1.  Optimal  at  the  nominal  error:  0. 
A:  Filtering  operation  in  (22).  ra=2. 

B:  Filter  by  Masreliez  and  Martin. 

Upper  lines:  lower  bounds. 
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B 


0.002 

0.13164 

0. 12240 

0.14394 

0.17464 

0.01 

0.19073 

0. 17853 

0.20686 

0.24633 

0.1 

0. 38537 

0. 36890 

0.40676 

0.45648 

0.15 

0.44693 

0.42999 

0.46759 

0. 51629 

0.25 

0.54234 

0. 52707 

0.56195 

0.60631 

0.  3 

0.58298 

0.56851 

0.60152 

0. 64324 

0.4 

0.65577 

0.64312 

0.67193 

0. 70798 

Table  10 

Comparison  of  breakdown  point  bounds. 
Model  1.  Size-m  batches  of  outliers. 
A:  Causal  filtering  operation  in  (22) 
B:  Filter  by  Masreliez  and  Martin. 
Upper  lines:  lower  bounds. 
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Bounds  on  the  Influence  Function 
Model  1.  Causal  filtering  operation  in  (22). 
£=0.002 

I°(v):  Influence  function  induced  by  the  optimal 
at  the  nominal  model  filter. 
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Figure  2 


Bounds  on  the  Influence  Function 
Model  1.  Causa]  filtering  operation  in  (’">). 
0.01 


l  (v)  : 
m 


Influence  function  induced  by  the  optimal  at 
the  nominal  model  filter. 


Bounds  on  Che  Influence  Function 

Model  2.  Causa]  filtering  operation  in  ('*’). 

e=0.0i 

^ uence  function  induced  bv  the  optimal 
the  nominal  model  filter. 
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